Overview

Dataset statistics

Number of variables14
Number of observations506
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory55.5 KiB
Average record size in memory112.3 B

Variable types

NUM13
BOOL1

Reproduction

Analysis started2020-06-04 14:31:20.989542
Analysis finished2020-06-04 14:31:41.140924
Duration20.15 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

TAX is highly correlated with RADHigh correlation
RAD is highly correlated with TAXHigh correlation
ZN has 372 (73.5%) zeros Zeros

Variables

CRIM
Real number (ℝ≥0)

Distinct count504
Unique (%)99.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.613523557312254
Minimum0.00632
Maximum88.9762
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum0.00632
5-th percentile0.02791
Q10.082045
median0.25651
Q33.6770825
95-th percentile15.78915
Maximum88.9762
Range88.96988
Interquartile range (IQR)3.5950375

Descriptive statistics

Standard deviation8.601545105
Coefficient of variation (CV)2.380376098
Kurtosis37.13050913
Mean3.613523557
Median Absolute Deviation (MAD)0.22145
Skewness5.223148798
Sum1828.44292
Variance73.9865782
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
14.333720.4%
 
0.0150120.4%
 
0.0826510.2%
 
0.53710.2%
 
1.3547210.2%
 
0.1410310.2%
 
0.0350210.2%
 
0.0361510.2%
 
0.6635110.2%
 
0.126510.2%
 
Other values (494)49497.6%
 
ValueCountFrequency (%) 
0.0063210.2%
 
0.0090610.2%
 
0.0109610.2%
 
0.0130110.2%
 
0.0131110.2%
 
ValueCountFrequency (%) 
88.976210.2%
 
73.534110.2%
 
67.920810.2%
 
51.135810.2%
 
45.746110.2%
 

ZN
Real number (ℝ≥0)

ZEROS

Distinct count26
Unique (%)5.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.363636363636363
Minimum0.0
Maximum100.0
Zeros372
Zeros (%)73.5%
Memory size4.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q312.5
95-th percentile80
Maximum100
Range100
Interquartile range (IQR)12.5

Descriptive statistics

Standard deviation23.32245299
Coefficient of variation (CV)2.052375864
Kurtosis4.031510084
Mean11.36363636
Median Absolute Deviation (MAD)0
Skewness2.225666323
Sum5750
Variance543.9368137
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
037273.5%
 
20214.2%
 
80153.0%
 
12.5102.0%
 
22102.0%
 
25102.0%
 
4071.4%
 
4561.2%
 
3061.2%
 
9051.0%
 
Other values (16)448.7%
 
ValueCountFrequency (%) 
037273.5%
 
12.5102.0%
 
17.510.2%
 
1810.2%
 
20214.2%
 
ValueCountFrequency (%) 
10010.2%
 
9540.8%
 
9051.0%
 
8520.4%
 
82.520.4%
 

INDUS
Real number (ℝ≥0)

Distinct count76
Unique (%)15.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.13677865612648
Minimum0.46
Maximum27.74
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum0.46
5-th percentile2.18
Q15.19
median9.69
Q318.1
95-th percentile21.89
Maximum27.74
Range27.28
Interquartile range (IQR)12.91

Descriptive statistics

Standard deviation6.860352941
Coefficient of variation (CV)0.6160087358
Kurtosis-1.233539601
Mean11.13677866
Median Absolute Deviation (MAD)6.32
Skewness0.2950215679
Sum5635.21
Variance47.06444247
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
18.113226.1%
 
19.58305.9%
 
8.14224.3%
 
6.2183.6%
 
21.89153.0%
 
9.9122.4%
 
3.97122.4%
 
8.56112.2%
 
10.59112.2%
 
5.86102.0%
 
Other values (66)23346.0%
 
ValueCountFrequency (%) 
0.4610.2%
 
0.7410.2%
 
1.2110.2%
 
1.2210.2%
 
1.2520.4%
 
ValueCountFrequency (%) 
27.7451.0%
 
25.6571.4%
 
21.89153.0%
 
19.58305.9%
 
18.113226.1%
 

CHAS
Boolean

Distinct count2
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
0
471
1
 
35
ValueCountFrequency (%) 
047193.1%
 
1356.9%
 

NOX
Real number (ℝ≥0)

Distinct count81
Unique (%)16.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5546950592885376
Minimum0.385
Maximum0.871
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum0.385
5-th percentile0.40925
Q10.449
median0.538
Q30.624
95-th percentile0.74
Maximum0.871
Range0.486
Interquartile range (IQR)0.175

Descriptive statistics

Standard deviation0.1158776757
Coefficient of variation (CV)0.2089033853
Kurtosis-0.06466713337
Mean0.5546950593
Median Absolute Deviation (MAD)0.0875
Skewness0.7293079225
Sum280.6757
Variance0.01342763572
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.538234.5%
 
0.713183.6%
 
0.437173.4%
 
0.871163.2%
 
0.489153.0%
 
0.624153.0%
 
0.693142.8%
 
0.605142.8%
 
0.74132.6%
 
0.544122.4%
 
Other values (71)34969.0%
 
ValueCountFrequency (%) 
0.38510.2%
 
0.38910.2%
 
0.39220.4%
 
0.39410.2%
 
0.39820.4%
 
ValueCountFrequency (%) 
0.871163.2%
 
0.7781.6%
 
0.74132.6%
 
0.71861.2%
 
0.713183.6%
 

RM
Real number (ℝ≥0)

Distinct count446
Unique (%)88.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.284634387351779
Minimum3.561
Maximum8.78
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum3.561
5-th percentile5.314
Q15.8855
median6.2085
Q36.6235
95-th percentile7.5875
Maximum8.78
Range5.219
Interquartile range (IQR)0.738

Descriptive statistics

Standard deviation0.7026171434
Coefficient of variation (CV)0.1117992074
Kurtosis1.891500366
Mean6.284634387
Median Absolute Deviation (MAD)0.3455
Skewness0.4036121333
Sum3180.025
Variance0.4936708502
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
6.16730.6%
 
6.22930.6%
 
6.12730.6%
 
5.71330.6%
 
6.41730.6%
 
6.40530.6%
 
6.3820.4%
 
5.30420.4%
 
5.98320.4%
 
7.18520.4%
 
Other values (436)48094.9%
 
ValueCountFrequency (%) 
3.56110.2%
 
3.86310.2%
 
4.13820.4%
 
4.36810.2%
 
4.51910.2%
 
ValueCountFrequency (%) 
8.7810.2%
 
8.72510.2%
 
8.70410.2%
 
8.39810.2%
 
8.37510.2%
 

AGE
Real number (ℝ≥0)

Distinct count356
Unique (%)70.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean68.57490118577076
Minimum2.9
Maximum100.0
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum2.9
5-th percentile17.725
Q145.025
median77.5
Q394.075
95-th percentile100
Maximum100
Range97.1
Interquartile range (IQR)49.05

Descriptive statistics

Standard deviation28.14886141
Coefficient of variation (CV)0.410483441
Kurtosis-0.9677155942
Mean68.57490119
Median Absolute Deviation (MAD)19.55
Skewness-0.5989626399
Sum34698.9
Variance792.3583985
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
100438.5%
 
9640.8%
 
98.240.8%
 
95.440.8%
 
97.940.8%
 
87.940.8%
 
98.840.8%
 
94.130.6%
 
8830.6%
 
21.430.6%
 
Other values (346)43085.0%
 
ValueCountFrequency (%) 
2.910.2%
 
610.2%
 
6.210.2%
 
6.510.2%
 
6.620.4%
 
ValueCountFrequency (%) 
100438.5%
 
99.310.2%
 
99.110.2%
 
98.930.6%
 
98.840.8%
 

DIS
Real number (ℝ≥0)

Distinct count412
Unique (%)81.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.795042687747036
Minimum1.1296
Maximum12.1265
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum1.1296
5-th percentile1.461975
Q12.100175
median3.20745
Q35.188425
95-th percentile7.8278
Maximum12.1265
Range10.9969
Interquartile range (IQR)3.08825

Descriptive statistics

Standard deviation2.105710127
Coefficient of variation (CV)0.5548580872
Kurtosis0.4879411222
Mean3.795042688
Median Absolute Deviation (MAD)1.29115
Skewness1.011780579
Sum1920.2916
Variance4.434015137
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3.495251.0%
 
5.287340.8%
 
5.400740.8%
 
5.720940.8%
 
6.814740.8%
 
3.651930.6%
 
7.317230.6%
 
5.491730.6%
 
7.827830.6%
 
5.415930.6%
 
Other values (402)47092.9%
 
ValueCountFrequency (%) 
1.129610.2%
 
1.13710.2%
 
1.169110.2%
 
1.174210.2%
 
1.178110.2%
 
ValueCountFrequency (%) 
12.126510.2%
 
10.710320.4%
 
10.585720.4%
 
9.222910.2%
 
9.220320.4%
 

RAD
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count9
Unique (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.549407114624506
Minimum1.0
Maximum24.0
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum1
5-th percentile2
Q14
median5
Q324
95-th percentile24
Maximum24
Range23
Interquartile range (IQR)20

Descriptive statistics

Standard deviation8.707259384
Coefficient of variation (CV)0.9118115166
Kurtosis-0.8672319936
Mean9.549407115
Median Absolute Deviation (MAD)2
Skewness1.004814648
Sum4832
Variance75.81636598
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2413226.1%
 
511522.7%
 
411021.7%
 
3387.5%
 
6265.1%
 
8244.7%
 
2244.7%
 
1204.0%
 
7173.4%
 
ValueCountFrequency (%) 
1204.0%
 
2244.7%
 
3387.5%
 
411021.7%
 
511522.7%
 
ValueCountFrequency (%) 
2413226.1%
 
8244.7%
 
7173.4%
 
6265.1%
 
511522.7%
 

TAX
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count66
Unique (%)13.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean408.2371541501976
Minimum187.0
Maximum711.0
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum187
5-th percentile222
Q1279
median330
Q3666
95-th percentile666
Maximum711
Range524
Interquartile range (IQR)387

Descriptive statistics

Standard deviation168.5371161
Coefficient of variation (CV)0.4128411987
Kurtosis-1.142407992
Mean408.2371542
Median Absolute Deviation (MAD)73
Skewness0.6699559418
Sum206568
Variance28404.75949
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
66613226.1%
 
307407.9%
 
403305.9%
 
437153.0%
 
304142.8%
 
264122.4%
 
398122.4%
 
277112.2%
 
384112.2%
 
330102.0%
 
Other values (56)21943.3%
 
ValueCountFrequency (%) 
18710.2%
 
18871.4%
 
19381.6%
 
19810.2%
 
21651.0%
 
ValueCountFrequency (%) 
71151.0%
 
66613226.1%
 
46910.2%
 
437153.0%
 
43291.8%
 

PTRATIO
Real number (ℝ≥0)

Distinct count46
Unique (%)9.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18.455533596837945
Minimum12.6
Maximum22.0
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum12.6
5-th percentile14.7
Q117.4
median19.05
Q320.2
95-th percentile21
Maximum22
Range9.4
Interquartile range (IQR)2.8

Descriptive statistics

Standard deviation2.164945524
Coefficient of variation (CV)0.1173060379
Kurtosis-0.2850913833
Mean18.4555336
Median Absolute Deviation (MAD)1.15
Skewness-0.8023249269
Sum9338.5
Variance4.686989121
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20.214027.7%
 
14.7346.7%
 
21275.3%
 
17.8234.5%
 
19.2193.8%
 
17.4183.6%
 
18.6173.4%
 
19.1173.4%
 
16.6163.2%
 
18.4163.2%
 
Other values (36)17935.4%
 
ValueCountFrequency (%) 
12.630.6%
 
13122.4%
 
13.610.2%
 
14.410.2%
 
14.7346.7%
 
ValueCountFrequency (%) 
2220.4%
 
21.2153.0%
 
21.110.2%
 
21275.3%
 
20.9112.2%
 

B
Real number (ℝ≥0)

Distinct count357
Unique (%)70.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean356.6740316205534
Minimum0.32
Maximum396.9
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum0.32
5-th percentile84.59
Q1375.3775
median391.44
Q3396.225
95-th percentile396.9
Maximum396.9
Range396.58
Interquartile range (IQR)20.8475

Descriptive statistics

Standard deviation91.29486438
Coefficient of variation (CV)0.255961624
Kurtosis7.226817549
Mean356.6740316
Median Absolute Deviation (MAD)5.46
Skewness-2.890373712
Sum180477.06
Variance8334.752263
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
396.912123.9%
 
395.2430.6%
 
393.7430.6%
 
393.2320.4%
 
394.7220.4%
 
396.2120.4%
 
395.6920.4%
 
396.0620.4%
 
395.6320.4%
 
395.620.4%
 
Other values (347)36572.1%
 
ValueCountFrequency (%) 
0.3210.2%
 
2.5210.2%
 
2.610.2%
 
3.510.2%
 
3.6510.2%
 
ValueCountFrequency (%) 
396.912123.9%
 
396.4210.2%
 
396.3310.2%
 
396.310.2%
 
396.2810.2%
 

LSTAT
Real number (ℝ≥0)

Distinct count455
Unique (%)89.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.653063241106722
Minimum1.73
Maximum37.97
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum1.73
5-th percentile3.7075
Q16.95
median11.36
Q316.955
95-th percentile26.8075
Maximum37.97
Range36.24
Interquartile range (IQR)10.005

Descriptive statistics

Standard deviation7.141061511
Coefficient of variation (CV)0.5643741263
Kurtosis0.4932395174
Mean12.65306324
Median Absolute Deviation (MAD)4.795
Skewness0.9064600936
Sum6402.45
Variance50.99475951
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
14.130.6%
 
6.3630.6%
 
18.1330.6%
 
8.0530.6%
 
7.7930.6%
 
9.520.4%
 
4.5920.4%
 
3.7620.4%
 
17.2720.4%
 
10.1120.4%
 
Other values (445)48195.1%
 
ValueCountFrequency (%) 
1.7310.2%
 
1.9210.2%
 
1.9810.2%
 
2.4710.2%
 
2.8710.2%
 
ValueCountFrequency (%) 
37.9710.2%
 
36.9810.2%
 
34.7710.2%
 
34.4110.2%
 
34.3710.2%
 

MEDV
Real number (ℝ≥0)

Distinct count229
Unique (%)45.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.532806324110677
Minimum5.0
Maximum50.0
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum5
5-th percentile10.2
Q117.025
median21.2
Q325
95-th percentile43.4
Maximum50
Range45
Interquartile range (IQR)7.975

Descriptive statistics

Standard deviation9.197104087
Coefficient of variation (CV)0.408165053
Kurtosis1.495196944
Mean22.53280632
Median Absolute Deviation (MAD)4
Skewness1.108098408
Sum11401.6
Variance84.58672359
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
50163.2%
 
2581.6%
 
23.171.4%
 
21.771.4%
 
2271.4%
 
20.661.2%
 
19.461.2%
 
20.151.0%
 
19.651.0%
 
19.351.0%
 
Other values (219)43485.8%
 
ValueCountFrequency (%) 
520.4%
 
5.610.2%
 
6.310.2%
 
720.4%
 
7.230.6%
 
ValueCountFrequency (%) 
50163.2%
 
48.810.2%
 
48.510.2%
 
48.310.2%
 
46.710.2%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTATMEDV
00.0063218.02.310.00.5386.57565.24.09001.0296.015.3396.904.9824.0
10.027310.07.070.00.4696.42178.94.96712.0242.017.8396.909.1421.6
20.027290.07.070.00.4697.18561.14.96712.0242.017.8392.834.0334.7
30.032370.02.180.00.4586.99845.86.06223.0222.018.7394.632.9433.4
40.069050.02.180.00.4587.14754.26.06223.0222.018.7396.905.3336.2
50.029850.02.180.00.4586.43058.76.06223.0222.018.7394.125.2128.7
60.0882912.57.870.00.5246.01266.65.56055.0311.015.2395.6012.4322.9
70.1445512.57.870.00.5246.17296.15.95055.0311.015.2396.9019.1527.1
80.2112412.57.870.00.5245.631100.06.08215.0311.015.2386.6329.9316.5
90.1700412.57.870.00.5246.00485.96.59215.0311.015.2386.7117.1018.9

Last rows

CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTATMEDV
4960.289600.09.690.00.5855.39072.92.79866.0391.019.2396.9021.1419.7
4970.268380.09.690.00.5855.79470.62.89276.0391.019.2396.9014.1018.3
4980.239120.09.690.00.5856.01965.32.40916.0391.019.2396.9012.9221.2
4990.177830.09.690.00.5855.56973.52.39996.0391.019.2395.7715.1017.5
5000.224380.09.690.00.5856.02779.72.49826.0391.019.2396.9014.3316.8
5010.062630.011.930.00.5736.59369.12.47861.0273.021.0391.999.6722.4
5020.045270.011.930.00.5736.12076.72.28751.0273.021.0396.909.0820.6
5030.060760.011.930.00.5736.97691.02.16751.0273.021.0396.905.6423.9
5040.109590.011.930.00.5736.79489.32.38891.0273.021.0393.456.4822.0
5050.047410.011.930.00.5736.03080.82.50501.0273.021.0396.907.8811.9